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Abstract 


The  applicabilily  of  exploratory  sequential  data  analysis  (ESDA)  techniques  for  analyzing  usability  test 
data  is  examined.  ESDA  techniques  include  transition  matrix  analysis,  lag  sequential  analysis,  frequency  of 
cycles,  graphical  summarization  techniques,  and  pattern  analysis  tccitniques.  A  subset  of  each  was  used  in 
analyzing  the  data  from  three  usability  studies.  The  encoding  schemes  used,  the  analysis  routiites  run, 
software  tools  to  support  encoding  and  analysis  (SHAPA  and  the  Maximal  Repeating  Pattern  analysis  tool), 
and  their  interactions  arc  discussed  and  the  different  types  of  usability  problems  which  can  be  extracted  from 
the  data  when  analyzed  with  ESDA  techniques  arc  illustrated.  It  is  concluded  that  the  ESDA  techniques  will 
be  useful  once  the  state  of  the  art  in  software  suiqx)ft  is  able  to  provide  the  analyst  greater  flexibility  in 
applying  the  analysis  routines.  Without  the  ability  to  apply  analysis  routines  to  multiple  data  levels,  too 
much  work  is  involved  in  obtaining  a  complete  analysis  of  usability  problems  at  all  levels, 


1  Introduction 

A  potential  problem  with  some  usability 
test  methods,  such  as  verba,  protocol  techniques 
and  user  observation,  is  subjectivity  in  the  data 
analysis  and  interpretation  (Hollcran  1991:  3S2). 
To  address  this  problem,  we  have  been 
experimenting  over  the  years  with  understanding 
the  applicability  of  exploratory  sequential  data 
analysis  (ESDA)  techniques  for  analyzing 
usability  data.  ESDA  is  a  family  of  tools  and 
techniques  for  exploring  sequential  data  collected 
in  complex,  dynamic,  event'driven  environments 
(Sanderson  1991).  Applying  the  techniques 
typically  involves  transcribing  and  encoding 
recorded  events,  and  applying  statistical  analysis 
routines  such  as  Markov  analysis,  lag  sequential 
analysis,  cycle  analysis,  and  pattern  analysis 
techniques  to  the  encoded  data.  It  has  been 
proposed  by  various  researchers  (Siochi  ct  al. 
1991,  Hollcran  1991)  that  such  techniques  could 
prove  useful  in  die  area  of  human-computer 
interaction  analysis. 

While  several  studies  have  been  documented 
in  which  Markov  analysis,  for  example,  has  been 
used  (e.g..  Hammer  and  Rouse  1979,  Penniman 
1975,  atul  Good  1985),  we  were  unable  to  find  a 
coinpreltcnsivc  guide  or  discourse  on  the  various 
ESDA  techniques  available  and  how  they  should 
be  used  in  the  context  of  usability  testing.  If 
these  types  of  techniques  prove  useful  iu 
usability  test  data  analysis,  they  would  enhance 
tlw  process  of  converting  logged  usability  tc.si 
data  into  information  that  is  less  subjective,  and 
more  rigorous  and  quantinablc,  and  would  penuit 


the  use  of  automated  analysis  tools.  This  paper 
describes  throe  systems  for  which  we  have 
performed  usability  tests  and  applied  a  subset  of 
these  ESDA  techniques  during  the  data  analysis 
phase,  and  our  lessons  learned  from  tlicsc 
cxpciicitccs.  Our  intent  is  to  provide  a 
perspective  in  this  area  that  will  help  oilier 
usability  analysts  decide  whether  it  is  worth  the 
effort  to  apply  these  techniques,  and  which 
techniques  arc  most  likely  to  produce  mcar  ingful 
results  for  them.  Our  experiences  may  also  help 
developers  of  usability  lest  tools  understand  the 
practitioner's  needs,  in  terms  of  Uic  ty|X!  of 
information  we  should  cxUuct  from  our  usability 
data  when  evaluating  the  i:sabihty  of  software 
systems. 

2  ESDA  Techniques 

Sanderson  (1991)  explains  ESDA  as 
encoiuixissing  concepts  from  cxploraioiy  data 
analysis  philosophy,  sequential  data  analysis 
(SDA)  techniques,  and  human-computer 
interaction.  She  proposes  ESDA  as  a  way  of 
analyzing  data  that  is  rich,  complex  and  multi¬ 
dimensional  and  cannot  be  readily  analyzed  with 
conventional  statistical  techniques. 
Characteristics  of  such  data  ate  tliat  events  unfold 
over  time  and  preservation  of  the  temporal 
dimc.nsion  is  inqxirtant,  (he  data  can  usually  be 
analyzed  at  many  different  levels,  and  you  may 
not  initially  know  the  questions  to  be  answered. 
SDA  techniques  itKhide  methods  for  sampling, 
coding  ;u)d  analysis.  Hie  analytic  methods 
include  time  series,  Markov,  lag  sequential, 
causal,  cycle,  grammars  (Samk'r.srxi  ct  al.  1991) 


1 


and  tiattcrn  analysis  and  idcntificotioit  techniques. 
Below  we  briefly  describe  the  techniques  we  have 
used,  whether  or  not  data  needs  to  be  encoded  and 
the  manual  versus  software-aided  options  for  data 
encoding  and  applying  SDA  techniques. 

2.7  Seqi>''’'fial  Data  Analysis 


aitothcr  state,  at  various  removers.  If  state  1  i.s 
the  target  state,  die  analysis  displays  tltc 
frequency  with  which  state  2  occius  directly 
before  state  1,  two  steps  before  .state  1,  directly 
after  state  one,  two  steps  after  state  1 ,  etc.  Tltis 
helps  determine  pottems  of  behavior  Utat  may  not 
be  strictly  sequential  (Sanderson  et  al.  1989). 


Tae  mo  ment  between  states  can  be 
explored  by  inodeling  them  as  a  finite  Markov 
chain,  which  is  deflned  by  Kemeny  and  Snell 
(1960:  201)  as  "a  stochastic  process  which 
moves  through  a  flnite  number  of  states,  and  for 
which  the  probability  of  entering  a  certain  state 
depends  only  on  the  last  state  occupied."  Matrix 
analyses  involve  constructing  transition 
frequency  or  probability  matrices  to  examine 
whether  there  are  dependencies  in  the  data.  The 
analysis  may  reveal  habitual  or  stereotyped 
patterns  of  behavior  (Sanderson  et  al.  1989). 

Lag  sequential  analyses  (originally  developed 
by  Sackeit  1974,  in  Sanderson  et  al.  1989) 
pcniiit  vague  patterns  to  be  discetned  in  a  data 
sequence.  Such  analyses  display  the  frequency 
with  which  one  state  occurs  with  respect  to 


Frequency  of  cycles  (originally  developed  by 
Fisher  1988,  in  Sanderson  et  al.  1989),  looks  for 
regularities  in  behavior  sequctKcs,  This  fonn  of 
analysis  provides  a  rc|X)rt  of  actually  occurring 
sequences  of  commands  or  states  in  a  single  cycle 
deflned  by  a  target  command  or  state.  That  is,  if 
the  target  state  is  sL^tc  1,  the  first  and  second 
occurrences  of  state  1  are  idciuificd  :md  all  tltc 
events  in  between  arc  stored  as  a  cycle.  This  is 
repeated  for  the  second  and  third  occurrences  of 
state  1,  and  so  on  until  all  tlie.  cycles  are. 
identified.  Each  cycle  is  tlicn  conqyarcd  for 
matches,  and  the  frequencies  of  each  cycle  arc 
counted.  The  result  is  a  listing  of  all  cycles,  the 
number  of  times  e.ach  occurred,  and  Uie  sequence 
in  which  they  occuircd  (Sanderson  et  a!.,  1989). 
Each  of  these  is  iilusirated  in  figure  1 . 
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Position 

(b) 


# 

Freq 

Cycle 

A. 

1 

Int.task  ->  cval.  ->  inttask 

B. 

1 

Int.task->  int.cxcc  ->  cva!.  ->  int.pcrccpi  ->  cval.  •>  cval.  •>  int.task 

C. 

5 

Int.task  ->  int.cxcc  ->  tape  ->ta|)C  ->  command  ->  button  ->  cval.  -> 
int.cxcc  ->  menu  ->  comm.'ind  ->  cval.  ->  cval.  ->  int.Utsk 

(c) 


Figurx*.  1 .  Outputs  from  Three  SDA  T  echniques, 

(a)  I'irst-Onlcr  Transition  Matrix,  (b)  1-ag  Sc<|ucntial  Analysis,  and  (c)  Freciucncy  of  Cycles. 
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Another  pattern  analysis  technique,  maximal 
repeating  pattenis  CMKP).  was  tJovolopcd  by 
Sioclii  and  Ehrich  (1991).  MRP  analysis  •voiis 
on  the  premise  that  repetition  of  user  actions  is 
att  important  indicator  of  potential  user  interface 
problems.  The  technique  detects  the  longest 
repeating  sequences  of  command  suings  in  user 
session  transcripts;  it  works  on  the  user  inputs 
only,  llie  method  is  unique  in  that  the  analyst 
docs  aot  identify  a  priori  the  patterns  or  strings 
the  tcclinique  should  scaioh  for;  all  tepcatiitg 
patterns  are  identined. 

2.2  Graphical  Summarhalion 

We  have  previously  proposed  graphical 
summarization  techniques  to  aid  in  analyzing 
sequential  data  in  the  computer-aided  design  task 
domain  (Cuomo  and  Sharit  1989).  These  ideas, 
based  on  the  human  problem  solving  concepts 
presented  by  Newell  and  Simon  (1977),  includetl 
task  movement  graphs,  task  rate  graphs,  and 
aitalysis  of  inter-event  intervals,  together  with 
Maikov  process  analysis.  The  grapliiuu  out{)Mt 
of  these  routines  is  illustrated  in  figure  2. 

The  first  two  grapiiical  techniques  arc  based 
on  the  concept  of  moving  through  a  design  or 
task  state  space  toward  a  goal.  Recorded  subject- 
input  events  arc  coded  in  terms  of  wiK'Utcr  the 
event  moved  the  user  a  step  forwaid  toward  a 
goal,  wlicihcr  tlic  command  was  a  result  of  using 
a  computer  and  did  not  move  the  user  forward  a 
step  (c.g.,  moving  a  window),  or  whether  tlic 
event  caused  tlic  user  to  move  backwards,  away 
from  the  end  goal  (c.g.,  delete  something).  Tire 
overall  slope  of  (lie  graph  indicates  the  efficiency 
with  which  a  task  is  performed.  Rate  measures 
c;>n  also  be  constnictcrl  which  show  absolute 
forward  movement  of  task  progress  i;vcr  a  time 
period,  where  time  jwiods  arc  the  total  time  for 
discrete  task  segments.  Finally,  inter-event 
interval  analysis  involved  plotting  the  elapsed 
time  between  subject-input  events  ag.ainsi 
interval  numbers.  I-ong  delays  between  events 
can  (hen  be  identified,  flagging  potential  problem 
iueas  requiring  fuitlicr  investigation. 


1  11  21  31  41  61 


Event  # 

(a) 


T!mo  (Min) 

0>) 


1  10  19  28  37 

Evont  Intorval 

(c) 


Figuse  2.  Outputs  From  Tlircc  Grapiiical 
Summarization  Techniques,  (a)  Task  Movemeni 
Graph,  (b)  Task  R.aic  Graph,  and  (c)  Inier-Evcnt 
Interval  Graph. 

To  apply  any  of  these  analytic  looiines, 
.software  can  be  written  in  any  convciuioiial 
softwaic  hmgiiagc  and  results  output  and 
displayed  with  commercial  dawbasc  and  graphing 
packages.  For  some  routines,  c.g.,  inter-event 
interval  iiiutiysis,  Uiis  approach  is  straightforward 
and  easy.  Utiving  several  stand-alone  analysis 
routines,  however,  doc.s  not  readily  supjxirt  the 
exploratory  .\spccls  required  for  a  complete 
analysis  of  the  data,  and  writing  analysis  code  can 
add  time  to  an  already  time-consuming  pfoces.s. 


Ncvcrtiiclcss  ihcJTc  niny  not  be  nltcmativcs.  Few 
munirc  software  tools  designed  to  support  liSDA 
iccliniqiiM  arc  available.  The  tooKs  we  were  able 
to  obuiin  arc  discussed  later  in  the  ixtptY. 

2.3  Encoding  the  Data 

Iltc  typo  or  data  collected,  the  analytic 
technique  itself,  and  the  questions  to  be  answered 
detemuno  whether  the  tcciuiique  erm  be  attplicd 
directly  to  d)c  collected  usability  duiu,  or  whether 
the  data  needs  to  be  ettcoded  first  WIten 
collecting  verbal  piutocols,  for  instance,  the 
content  of  tlie  sentences  and  actions  of  interest 
need  to  bo  extracted  and  encoded  before  ESDA 
techniques  con  be  niiplicd;  statistical  tcchttiquos 
obviously  cannot  be  aitplicd  diteedy.  When  files 
of  actuid  mouse  aitd  koysuokes  or  command 
inptit  files  are  collected,  some  of  the  analysis 
techniques  can  be  iqtplicd  without  encoding  the 
data.  Siochi  et  al.  (1991),  for  instance,  applied 
their  MRP  routines  to  uncncodcd  command  input 
files;  such  files  may  need  some  filleritig  to  be 
put  in  a  form  accepiaitio  to  die  iuiniysis  pruKiam. 

Ibc  task  mto  and  task  movement  graphs, 
although  applied  to  similar  types  of  user  input 
files,  must  be  encoded  as  die  task  movement 
implications  of  each  input  must  be  determined. 

For  a  technique  such  as  tnuisiUmi  matrices,  die 
purpose  of  tlrc  amdysis  will  rk’.tcmiinc  whether 
the  data  needs  to  be  encoded.  If  the.  goal  is  to 
dctcnninc  die  most  commonly  occurring  input 
pails,  for  instance,  ilie  technique  should  be 
aiiplicd  to  the  acliuil  user  input  scquciKCs.  If 
paltems  of  higlier-levcl  bcliavior  arc  sought,  tlie 
daui  sliould  lie  cotkd  a|it)ix)t)riaicly;  in  ftict  ikiui 
from  .several  .sources  may  be  used  to  support 
ujiplicaiion  of  die  encodings  to  tk'  riata. 

When  encoding  tlata  files,  sclecdon  of  the 
encoding  scheme  is  a  cridcid  factor  influencing  tlie 
questions  diat  can  he  answered  by  die  data.  An 
appro|>rialc  sckmic  is  usually  csiablislicd  iteratively. 
Once  dccidul  u|X)n,  tlie  analyst  can  apply  it  manually 
by  examining  the  data  files,  chousing  the  appfOi)fi,i'c 
code  for  each  input  or  group  of  inputs,  and  typing  the 
softwiue  code.s  into  a  new  file.  Allemaiivcly,  tools  to 
sup|K)it  data  oiicoriiiig  can  Isc  written,  or  er.isling 
tools  used. 


2.4  Software  for  Supporting  Data  Encoding  and 
SDA  Technique.^ 

As  it  is  not  practical  to  apply  these  GSDA 
tcclmiqucs  or  complex  encoding  schemes 
mniiiially,  .software  support  is  required  for  bodi 
die  data  encoding  and  npplicadon  of  die  niialysis 
roudnes.  How  the  software  applies  die 
techniquc,s  to  die  data,  and  the  amount  of 
flexibility  in  the  software,  however,  affect  the 
resultant  uscfulmvss  of  die  analysis  ruiitiiie 
outputs. 

One  tool  we  were  able  to  obtain  that  supjKifis 
both  encoding  and  analysis  was  SHAPA  (Software 
for  Hcviristically  Aiding  Protocol  Analysis) 
Version  2.0 ,  developed  by  the  University  of 
Illinois  at  Urbana-Champaign  Engineering 
Psychology  Research  Laboratory.  SHAPA  is  a 
pFoiCK'ot  aiialysi.s  environment  |wrniitling 
researchers  to  encode  data  in  any  way  dicy  choose. 
It  works  on  singlc>stream,  iin-uiiicsiamixui  verbal 
and  iioii'verlial  protocols,  running  on  mi  IBM  PC 
or  coiiipaiibtc.  A  new  version  ciiiTcntly  uiulcr 
development,  MACSIIAPA,  will  include  the 
ability  to  analy/e  nuiiiiplc  stieam,  time-stamixxl 
daUi  from  video  records  its  well  its  ASCII  files 
(Sandenson  et  al.  1991). 

To  encode  tlala  with  SHAPA,  the  rest'atciHjr 
first  thx:ide,s  on  tlw  'nctxiing  .sclionic,  Uicii 
s|)ccific.s  the  co<kvs,  callcil  ixodicates,  mi  dxi 
predicates  screen.  A  predicate  consists  of  the 
name  followed  by  its  argimicnts  or  values  in 
p;ircmhe.scs,  and  might  look  like:  IN  T.TASK  (1- 
l.l-sc(!ii.ie),  or  MENUfview,  ni).  By  having  the 
data  analyst  predefine  die  ixcdicates  aixl  ikMr 
values,  SHAPA  can  allow  the  user  to  tyix>  in  an 
ablire^vimion  for  Ciich  jxedicate,  leaving  the 
software  to  complete  die  name.  Once,  all  tlie 
files  iuc  ciicork'd,  running  the  analysis  routines  is 
as  etisy  as  selecting  the  de.sircd  routine  from  liic 
Reports  menu.  SHAPA  sup|X)rts  transition 
matrix  analysis,  lag  .sequential  analysis, 
iVeqircncy  of  cycles,  vahx'.  lists,  and  predicate 
insuinces.  The  value  list  routine  generates  a 
nqx>ii  which,  for  each  pierlieate,  li.sLs  all  the 
constan;  vaUrea  vesed  as  jire^licaie  argiimonls  and 
their  frcqiK'iicy  of  use.  The  collection  of 
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|)tv4tcatc  instances  routine  coliccis  segments 
encotJed  with  Utc  sttme  iHtHlicntc  and  displays  die 
line  number  eo  which  they  occiifTcd;  values  can 
also  be  s|)ceincd  tts  being  of  interest  (James  ci  al. 
1990).  Except  for  these  last  two  routines,  which 
do  take-  account  of  the  prodicaio  vaUios,  the  oUicr 
routines  all  report  on  i)re<licates  only,  not  Utcir 
specific  vUues.  (See  any  of  the  Sanderson  ct  ml. 
references  or  James.  Sanderson,  and  Soldier  1990 
foi  more  detailed  dcscri|>tions  of  SI  I APA  and  its 
capabilities.) 

A  second  software  package  wo  were  able  to 
obunn  forotir  usability  lab  was  the  Mnxintal 
Reteating  Pattern  (MRP)  Tool  developed  at  the 
Dqiartmoni  of  Computer  Science  at  Virginia 
Polytechnic  Institute  and  State  University,  wliich 
suiitMfts  extraction  of  MRPs  from  togged  user 
input  data.  We  used  die  version  designed  to  run 
on  UNIX-based  sy, stems.  To  use  the  a|)|)lication, 
die  logged  user  input  data  is  normalized  to 
convert  raw  transcripts  to  a  standard  fonn. 

Siocht  ct  al.  (1991)  tisctl  the  tool  to  analyze  data 
{Hitteni.s  <Ni  coilwicd  user  sessions  on  a 
command-based  image  }>rocciising  system  called 
GIPSY.  They  nomuilized  tlw  raw  transcript  files 
by  oxtntcting  inputs  from  the  collected  user 
inputs  and  system  outputs,  and  then  extracted 
single- word  comiiuUKis  from  the  command 
argument  pairs  in  the  uiuiscripi.  The  software 
then  identifies  nil  the  MRPs  in  die  file.  MRPs 
arc  dcfiiKd  os  repealing  piUlcnis  dial  are  as  long 
ns  possible,  or  are  iiulcjxuale.ntly  occurring 
substrings  of  longer  irattems  (Sioclii  et  al.  1991: 
316).  Tlwir  outputs  include  each  identined  MRP 
in  otvier  of  devrertsing  length,  iukI  iIki  numl)cr  of 
instances.  Summary  iniomiution  includes  the 
numlK^r  of  MRPs  found  and  dicir  minimum, 
maximum,  and  average  length.  Tlic  analyst  cun 
filler  MRPs,  examine  specific  instances  of  each, 
and  get  mote  details  on  a  specific  one,  following 
a  (xiintcr  back  to  its  instance  in  the  raw  transcript 
file.  Tltis  technique  tcrtds  to  generate  large 
amounts  of  data  wtiieh  n;xd  to  be  filtered.  Tlic 
MRP  developers  eruimatc  that  one  MRP  is  found 
for  every  20  to  25  commanil  line.s. 


3  Usability  Tc.stlng  of  Three  Softwai  ? 
Systems 

In  die  |>ast  four  years,  we  have  conducted 
three  fonnal  usability  studies  on  a  variety  of 
.systems  and.  for  each,  hove  used  some  sulxsct  of 
E.SDA  iccliniqucs  to  analyze  tlic  eollcctcd 
usability  data.  All  systems  wore  similar  in 
|)Ossc.ssing  gra|)hical,  direct  monipuiation  style 
user  interfaces.  All  supported  realistic  tasks  tliat 
w««  user-controlled  as  oppo.sc<l  to  system- 
controlled,  and  none  of  the  in.sks  were  time 
ciiiical.  All  (he  tasks  wore  also  ill-defined,  in  the 
sense  dial  Uicre  could  lie  many  correct  solutions  to 
dtc  problems:  we  expected  high  degrees  of 
variability  across  participants  in  (enns  of  both  the 
problem-solving  approach  taken  and  the  com;ni(cr- 
use  suatcgics.  T.tsk  times  for  each  test  ranged 
from  one  and  a  ludf  hours  to  approximately  four 
hours.  Pinnlly,  all  participants  were 
representative  of  the  intended  user  poi)ulation,  ami 
titc  iinrticipnms  in  each  test  were  tmiiiud  to  use  the 
system  by  the  usttbility  testers.  Tliey  liiul  not 
used  the  systC'iiis  prc-viousSy, 

Pacli  sy.stem  and  the  usability  test  |woce<lnrcs 
used  are  d'jscribcd  briefly  Ixdow,  IV.milerl 
desciiptions  of  proccdum  and  usability  problems 
identified  can  Ire  fouiKl  in  die  refeienccd  usability 
test  re.|X)rts.  Our  fwiis  licre  is  on  die  way  in 
which  rlaia  cixorliiig  was  |x:rformcd  and  iIk; 
cffceiiveiKvis  of  applying  EvSDA  tectiniqiie.s. 

3.1  Computer-Aided  Architectural  Design  .Sy.stem 

A  ii.sabiliiy  slurty  of  a  commercial  computer- 
aided  arcliitcciural  design  (CA.^D)  system  was 
performert,  in  wliicli  six  aa'Iiiteclural  design 
students  performed  two  design  tstsks  at  one  of 
tlwcc  levels  of  complexity  (Cuomo  and  .Sliarii 
1989).  Although  verbal  proioc-ols,  user 
kcysmrkes  and  stylus  inptus  were  collected,  only 
die  keystroko-s  and  stylus  inputs  were  used  for  die 
applicaiion  of  SDA  techniques. 

Two  ciicorling  scliciucs  were  usr'd  iu  this 
study,  one  a  subset  of  the  other.  In  die  first 
schoiuc,  eacli  user  input  action  was  assigiierl  one 
code.  The  ion  original  cixles  svore:  error. 
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Onifling,  siilii/join,  |xii«q)uuil,  motor,  quoty, 
diawiiig,  system.  InickwaiJ  otte  step.  l)ackwai\l 
many  stops.  Sonre  codivs  rcflccta'.  ni;ii()ii.s  taken 
on  the  iwit  of  ihe  user  to  leducc  diffeivnt  types 
of  infomiatioit  processing  loads  suclt  as 
|]crccp(ual  (c,g..  /.ooming  reduces  pcfceptual 
wori  ‘  xKl)  and  motor  load.s,  while  other  codes 
rot  cd  forward  or  Utokwaid  movement  to 
produce  a  design  drawing,  and  fliudly,  codus 
wca'-  applied  to  iniMjts  which  are  nn  onifoct  of 
using  a  computer  (sy.stent  code). 


selection,  luid  process  inudiric^ition;  tuid  user 
tictivitit'.*j  (rlispLiy  cxnmiimtion,  eiror,  error 
cofiech  M  and  idcniificniion.  materials  mfercncc. 
pati-scs,  .si,  >jcci  comments,  requests  for  Itelp. 
seaiclHJs);  and  cxporiinenlor  comment.  Some  of 
die  prwlicntos  Itnd  values  dcnncrl  n.s  well.  Tl»o  data 
wits  encoded  using  SH  APA,  All  the  SDA 
techniques  available  in  St  1  APA  were  run  on  tlie 
encoded  data,  in  addition  to  utlculatiun  of  the  nunx^ 
iniditioniit  nsnhility  nuvtsnres  .such  as  frcrpicncy  of 
usability  tester  intervention. 


When  the  codM  wero  aioplicd  to  tho  data  it 
transpired  that  two  of  the  codes  weos  used  very* 
inftcquently:  for  subsequent  nnolysis,  the  codes  of 
query  anti  split/join  were  collapsed  into  die 
syslcnr  code.  For  each  usot^s  encoded  data  file, 
we  gcncratctl  first-order  transidon  matrices  such 
as  the  one  shown  in  figiiro  1.  To  apply  the 
gmpliicid  .suntmariration  techniques,  we  fiutltcr 
simplifiod  tho  encoding  techniques  so  th.nt  each 
iiipui  was  classified  into  one  of  dire<r  slates;  a 
forward  movemeni,  a  backward  movement  of 

♦*»  4«\.  -  *•%  .......  ..l.... 

interval  analy.sis  was  oiiplicd  to  the  time  stamps 
of  each  input,  and  die  encoded  tlaia  was  not 
involvi'd,  TIte  data  encrxling  for  this  simly  wa.s 
jKirfoiimcd  manually  and  the  analysis  routines 
were  written  in  FORTRAN, 


3.2  Simiilaiion  mui  Rapid  Proiotypins  System 


3.3  Airspace  Scheduling  System 

Our  mos'.  recent  mul  sotdiLsticatcd  usability 
lest,  mcMSurcd  by  allcnlion  given  to  developing 
on  encoding  language  and  as.scssing  die 
ai^licability  of  the  F.SDA  tcch!iiqMC.s.  wn? 
|K!rfomicd  on  a  protixype  military  airsjKia 
.scitcdulmg  system  (Cuomo  and  Bowen  1993). 
Four  subject-matter  cxiicrts  and  a  user-system 
interface  (USl)  e\)tett  participated  in  this 
usability  study.  Verbal  protocols,  a  videotape  of 
die  disjday.  and  limo-SUimiK'd  'tser  kcv.-iinikes  and 
mouses  inputs  wen."-  collecu'd.  n»e  two  da.a 
streams  were  integrated  before  a|)|)licatio>i  of  an 
encoding  scheme,  alUnving  tl»o.  ii.scr  actions  to  lie 
siincmrctl  within  a  task  coiuexi.  Many  of  the 
users'  p.sychologic.'il  imemions  were  idcniineil 
allowing  hicrardiical  .scgincniaiion  of  llie  input 
.Kiions. 


We  perfonned  a  second  usability  study  which 
involved  liSDA  techniques  on  a  iKia  version  of  a 
software  y,y.sicin  devcloiKd  to  .supixm  prounyping 
.and  .simulations  wiili  graphical  display.  Five 
IKtrticipants  with  vtu-ying  degrees  of  computer  and 
siimil.ation  cx|x;ricncc  |>aniciputcd.  For  this 
usability  study  we  were  able,  to  collect  ordy  vcrtMtl 
protocols  and  a  videolapc-  of  die  display.  Hie 
videotape  was  traiiscrilicd,  including  botli 
participants'  comments  and  actions,  ami  then 
segmented  and  encoded  with  25  ixcdicaics.  Tlic 
j)fc<licatcs  were,  a  mix  of  user  iiitcnacc  iXijeci 
related  encodings  (e.g.,  menu  selections,  new 
intei  fiKC.  diakg  completion,  save);  task  activities 
iclatcd  to  building  the  simulation  model,  such  as 
load  model  entity,  olijcx  l  and  piioces.s  definition 
and  manipulation,  siatcnicnl  morlification  and 


Hie  21  encodings  used,  wh.xli  wen^  loosely 
iKLScd  on  Nomiaii’.^  (1986)  stages  of  user  activity 
imxlel,  were  diviiVd  into  two  levels  iirKl  aix> 
shown  in  i.able  1.  flie  first  level  inclmled  ilie 
semantic-level  prcdic4ttcs;  they  iiroviilcd 
iiifonnation  on  iIk'  nwr'.s  overall  strategy,  wlicrc 
ami  what  lytxxs  of  errors  were  nvadc  and  wlieihcr 
they  were  rccovcral  from,  what  tasks  were 
irerformed  within  each  go;rl  (task  intentions), 
what  computer  steps  (liuojxions  to  execute)  were 
attcniptixl  in  performing  each  task,  and  an 
evaluation  of  the  success  of  each  task  and  oixli 
execution  sequence  |)ctfonned  within  that  task, 
llic  ariiculatory-kwcl  enctxlings  focased  on  die 
acttuil  .sequences  of  commands  .and  user  inpui.s  for 
each  intention  to  execute,  and  reflected  generic 
user  interface  object  us.'t;- '.  liacb  of  the 
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predicates  also  had  detailed  values  whic  h  usually 
included  tlic  actual  instance  of  the  more  generic 
predicate  type.  For  instance,  the  "task  intemicMi" 
piedicate  included  values  for  the  goal  number,  the 
task  number,  and  the  name  of  the  task  the  subject 
was  perfonning  (e.g.,  scheduling  a  particular 
mission,  resolving  a  conflict).  The  "evaluate" 
predicate  values  included  the  goal,  task,  and 
execution  number,  the  name  of  the  task  or 
execution  being  evaluated,  and  the  evaluate  state; 
possible  states  were  abort,  incomplete,  OK,  or 
wrong. 

The  aniculatory-lcvel  values  included  the  actual 
instance  of  the  user  interface  object  type  being 
selected  as  well  as  the  input  device  code  (whether  the 
mouse  or  keyboard  was  used,  "m"  or  "k").  The 
"command"  predicate,  for  example,  had  as  values  the 


actual  cemmand  name  selected  followed  by  the  input 
device  code.  The  predicate  "fields"  had  the  name  of 
the  dialog  box  it  resided  in,  the  field  name,  and  the 
user  function  performed  in  it  (data  entry  into  blank 
field,  deleted  information  in  the  field,  edited 
information  in  a  populated  field,  or  field  was  selected 
but  not  modified)  as  its  three  values.  The  encoding 
scheme  and  its  hierarchical  nature  are  illustrated  in 
figure  3.  SHAPA  was  again  used  as  the  encoding 
tool;  its  analysis  routines  were  run  on  some  of  the 
encoded  data  files.  We  also  experimented  with  the 
MRP  tool  by  applying  it  to  the  participants'  uncoded 
data  flics.  The  data  files  contained  the  numbered  lines 
shown  in  the  right-hand  side  of  figure  3,  but  with  the 
time  information  removed. 


''able  1.  Encodings  Used  for  the  Airspace  Scheduling  Study 


Encoding 


Definition 


Goal 

Task  intention  (Int.task) 

Perception  intention 
(Int.per) 

Intention  to  execute 
(Inl.cxc) 

Evaluate  (Eval) 

Error  in  intention  (Err.int) 

Enor  in  action  specification  (Err.acsp) 
Error  in  execution 
(Err.  exec) 

Error  in  |)crception 
(Err.  per) 

Error  in  interpretation 
(Err.  inter) 

Enor  in  evaluation  (Etr.eval) 
Recovered  error  (Rec.cn) 

^SDS^^^S51^BB9BBSIBSS^BSSSm 

Menu 


Command 

List-select 

Button 

Field 

Scroll 

Tape 

Timebar 

Form 


Scenario  step. 

An  intention  to  complete  one  task  contributing  to  the  completion  of  a  goal. 
An  intention  to  improve  the  perceptibility  of  a  display. 


One  computer  step  (may  be  comprised  of  multiple  actions)  leading  to  the 
completion  of  a  task  intention.  Several  steps  may  be  required  per  task  intention. 
The  success  with  which  the  intention  was  accomplished. 

The  intention  was  inconect  and  will  not  accomplish  the  goal. 

Wrong  sequence  of  actions  to  accomplish  the  intention  to  execute. 

Manual,  motor  enor  in  executing. 


Break-down  in  human  perceptual  processing  of  information  on  a  display. 


User  fails  to  interpret  system  state  concctly. 


User  mistakenly  thinks  has  or  has  not  moved  closer  to  the  goal. 
Error  was  detected  and  recovered  from. 


A  menu  was  opened 
A  command  was  selected 
An  item  is  selected  from  a  list 
A  button  was  selected 
An  action  was  taken  in  a  field 
A  scroll  bar  action  was  performed 
A  mission  icon 

Manipulation  of  the  limcbor  which  conuols  horizontal  scrolling  in  schctiulc 
Ihe  background  area  of  the  schedule 


GOAL(l  -scukte) 

INT.TASK(]-l-setdalc'» 
lNT.EXEC(l-l.l-seld8te) 
MENU(view^) 
COMMAND((i^jn) 
BinTON(d*ic-c«iiccl) 
ERR.EVAL  (el^M.l-setdate- 
thought  needed  urspoces  on  display) 
EVALUAT£(1-1 .1  -setdale-aboit) 


INT.EXEC(  1  - 1  ^-setlayout) 

MENU(vicw^) 

COM  MAND(layout,m) 

I  JST_SGLECr(layout-undis) 
BinTON(Iayout-add) 
lJST_SELECTOayout-undis) 
BUTTONGayout-add) 

BinTON(layout-ek) 

EVALUATE(l-1.2-setlayout-ok) 


Figure  3  Sample  of  tlie  Encoded 

The  first  study  using  the  CAAD  system  had 
a  slightly  different  focus  than  the  other  two.  It 
looked  at  the  interaction  between  an  architect's 
menial  design  activities  and  the  use  of  a 
computer  tool  to  support  those  activities.  It  was 
the  only  study  that  had  variables:  two  different 
types  of  design  tasks  at  three  different  levels  of 
complexity.  The  second  two  studies  were  basic 
usability  studies  performed  to  identify  areas  in 
the  system  design  that  hindered  users'  task 
pcrfonnaiKe.  There  were  no  conditions  to 
compare  results  across,  and  no  pre-detennined 
questions  to  answer  or  hypotheses  to  prove. 

4  Applications  of  ESDA  Techniques  to 
the  Collected  Usability  Test  Data 

Sequential  data  analysis  techniques  are 
potentially  useful  for  analyzing  usability  data. 

To  fully  analyze  the  usability  of  a  system, 
however,  requires  analysis  of,  and  information 
on,  the  human-computer  interaction  process  at 
several  levels.  At  the  highest  level  is 
information  on  the  user's  goals,  intentions,  and 
other  high-level  psychological  processes.  Tlie 
next  level  involves  the  user's  computcr-usc 


Alright.  Okay,  so  I  wpnt  to  that  week." 

001  11:32:^9  000  Pressed  BiUlon  on  View  Button 
in  Main  Menu  Bar 

002  11:32:41  002  Released  Button  on  Date  Button 
in  View  Menu 

003  11:32:43  002  Pressed  Button  on  Cancel  Button 
in  Date  Dialog 

"Well,  I  probably  need  airspaces  up  there  first." 

004  11:32:45  002  Pressed  BuUou  on  View  Button 
in  Main  Menu  Bar 

005  11:32:47  002  Released  Button  on  Change 
Layout  Button  in  View  Menu 
"Who  am  I  again?  Phoenix" 

006  11:32:52  005  Pressed  Button  on  Undisplaycd 
SUA  List  in  General  Layout  Dialog 
"Ah.  Yankee  1." 

007  11:32:58  006  Pressed  Button  on  Add  Button  in 
General  Layout  Dialog 

008  11:32:59  001  Pressed  Button  on  Undisplayed 
SUA  List  in  General  Layout  Dialog 
"Ah,  Yankee  2." 

009  11:33:01  002  Pre.csed  Button  on  Add  Button  in 
General  Layout  Dialog 

010  11:33:02  001  Pressed  Button  on  OK  Button  in 
General  Layout  Dialog 

from  the  Airspace  Usability  Study 

strategics  and  understanding  how  well  the  system 
meets  the  user's  needs  to  carry  out  each  task 
intention  and  convey  information.  Data  files  of 
actual  input  actions  can  provide  only  some  of 
this  information.  Performing  SDA  on  unencoded 
data  flics  of  user  input  actions  will  reveal 
information  on  usability  at  a  low  level,  and  only 
on  certain  types  of  problems.  Repeated  patterns, 
execution  errors,  etc.  can  be  seen,  but  the  context 
of  the  user's  intentions  arc  lost.  Performing  the 
wrong  sequence  of  actions  to  act:omplish  an 
intention,  for  instance,  is  a  type  of  error  that  will 
not  be  detected  since  the  intention  is  not  known. 

Choosing  a  complex,  hierarchical  encoding 
scheme  such  as  that  used  for  the  airspace 
scheduling  study  offered  potential  for  data 
amdysis  at  several  levels.  The  generic  encoding 
scheme  used,  however,  was  a  poor  match  with 
SHAPA,  which  uses  only  predicate  names  for 
many  of  its  routines  and  ignores  the  predicate's 
values.  Much  of  the  detailed  information  was  in 
the  prctlicalc  values,  so  the  resulting  outputs  of 
the  SDA  techniques  were  ambiguous  and  hard  to 
interpret.  Furthermore,  some  information  useful 
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to  assessing  usability  was  not  generated  with  any 
of  the  software  supported  techniques. 

To  fully  analyze  the  airspace  usability  data, 
we  were  therefore  forced  to  resort  to  manual 
methods  (Cuomo  et  al.  1993).  SHAPA  also  had 
file  size  limitations  which  forced  us  to  break  a 
single  user's  encoded  data  file  into  many  separate 
files;  hence  report  ouqtuts  had  to  be  manually 
integrated.  Below  we  discuss  the  different  ESDA 
techniques  we  tried  and  explain  the  types  of 
usability  problems  they  were  able  to  detect  as 
well  as  their  shortcomings. 

4.1  Transition  Matrices/Lag  Sequential  Analysis 

The  usefu’mess  of  the  transition  matrices 
varied  from  study  to  study.  In  the  CAAD  study, 
cnor-to-error  transition  probabilities  turned  out  to 
be  the  most  useful  usability  indicator  as  it 
reflected  an  important  aspect  of  usability,  error 
recovery.  The  most  skille«1  paiticv^t  had  a  0% 
probability  of  moving  from  one  error  .state  i  . 
another,  while  some  participants  had 
probabilities  as  high  as  21%.  The  simulation 
study  usability  analyst  vtas  able  to  discern  only 
one  usability  problem  from  the  transition  matrix 
technique,  and  it  was  identified  from  the  second 
order  transition  matrix  output  For  all  users,  the 
frequency  of  the  combination  "new  interface, 
save,  new  interface''  was  high.  This  sequence 
reflected  the  modal  nauire  of  this  system  design., 
which  only  allowed  users  to  save  their  woric  fron. 
one  screen.  Users  would  switch  screens,  pcifo'^ 
their  stive  function,  and  switch  back  to  thei' 
original  screen. 

Transition  matrices  were  slso  not  VC17  useful 
in  analyzing  the  military  airspace  system  data. 
One  type  of  usability  problem  that  could  be 
extracted  was  the  frequency  with  which  a  "menu" 
activity  followed  another  "menu"  activity,  or  any 
redundant  double  nction.  The  menu  example  may 
indicate  that  mem;  being  searched  in  an 
attempt  to  locate  tin.  ^ired  comntand. 
Rq}etitions  of  other  actions  could  indicate  a  lack 
of  system  feedback  or  slo  system  response 
time.  For  the  airspace  study,  the  generic  nature 
of  the  predicate  names  provided  no  information 


on  the  actual  instances  of  each  action.  From 
patterns  such  as  "menu  menu"  or  "button  button" 
we  could  not  determine  which  menus  and  buttons 
were  activated,  or  even  be  sure  if  they  were  the 
same  cr  different  objects. 

Analyzing  data  of  this  type  is  also  difficult 
because  of  the  large  number  of  natural  patterns 
Uiat  occur  during  the  use  of  direct  manipulation 
interfaces  (e.g..  command  follows  menu).  The 
large  number  of  these  obvious  or  expected 
p^tems,  with  their  high  frequencies,  make  it 
diffKult  to  identify  the  often  less-frequently 
occurring  potential  usability  indicators;  there  is 
much  noise  in  the  data. 

Reviewing  the  literature  to  detct,.?inc  other's 
success  using  transition  matrices  in  human- 
computer  interaction  analysis  revealed  that  it  was 
used  most  frequently  to  describe  users'  behavior 
patterns  but  not  necessarily  to  determine 
usability  problems.  Good  (198S)  used  uncoded 
command  transition  hequency  data  to  determine 
the  most  common  transitions  between  keys. 

This  information  was  used  in  designing  a  new 
keyboard  layout.  Hammer  and  Rouse  (1979) 
used  *he  technique  to  assess  how  researchers  used 
<e*  ors  in  writing  their  own  programs  and 
.  )s.  They  created  16  states  for  their  Markov 
'  ;1  involving  functions  such  as  typinr 
'itioniiig,  deleting  and  inserting,  and 
.sc.  '.hing.  They  found  differences  in  patterns  of 
'>e^  .or  between  editors,  between  tasks,  and 
'  iiong  users.  They  did  not  appear  to  use  the  data 
.*  identify  usability  problems.  Finally, 
<^enniman  (1975)  used  the  technique  to  analyze 
users'  search  behavior  on  an  on-line  retrieval 
system.  He  used  both  an  11  state  and  a4  state 
nnodcl.  Again,  he  found  variations  in  users' 
patterns  of  behavior  in  comparing  both  sessions 
of  different  length  and  different  parts  of  single 
sessions.  As  Penniman  noted,  the  technique 
provides  a  quantitative,  statistical  rigor  for 
comparing  behavior  across  samples  of  different 
types,  and  thus  helps  to  describe  user  bcl  'ior. 

For  transition  matrices  to  be  useful  in 
analyzing  *  tiiiily  data,  behavior  must  be 
compat'  ween  two  systems  or  across  time,  or 
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there  have  to  be  certain  defined,  and  detectable 
behavioi  patterns  that  alone  can  indicate  usability 
problems.  For  instance,  if  users  are  studied  for  a 
long  time,  changes  in  their  patterns  of  behavior 
can  letltsct  svstem  learnability  or  the  effect  of 
increased  experience.  Examples  of  behavior 
patterns  that  reflect  potential  usability  problems 
were  the  error-to-eiror  transiticn  in  the  CAAD 
study,  excessive  switching  between  interfaces  to 
perform  a  single  function  in  the  simulation 
study,  or  repeating  the  same  action  successively. 

The  lag  sequential  analysis  revealed  no 
usability  pioblents  for  the  airspace  scheduling 
study.  For  the  simulation  study,  which  used 
more  detailed  predicates,  this  routine  was  found 
to  be  useful  when  run  with  reiqiect  to  errors,  as  it 
helped  to  identify  what  activities  preceded  errors. 

4.2  Frequency  of  Cycles 

In  addition  to  the  predicate  name-only 
limitation,  SHAPA  had  tlie  additional 
shortcoming  of  calculating  frequency  of  cycles 
only  between  instances  of  the  same  predicate  - 
for  example,  goal  to  goal  ot  menu  to  menu,  but 
not  menu  to  button.  This  greatly  limited  the 
usefulness  of  this  analysis  technique  for  our 
studies.  In  the  airsp'jcc  scheduling  study,  in  one 
participant's  (the  USI  expert)  data  we  found  a 
pattern  that  occurred  62  times:  intention  to 
execute  ->  mission  icon  ->  menu  ->  command  -> 
evaluate  ->  intention  to  execute.  This  was  the 
basic  sequence  of  activities  needed  to  schedule  the 
displayed  airspace  requests  (also  called  mission 
icons).  Acting  on  the  intention,  the  user  selects 
first  a  mission  icon,  then  the  schedule  menu,  and 
then  the  ap|»t)priatc  scheduling  command  (deny, 
approve,  describe  conflict,  etc.).  The  user's  cycle 
was  completed  with  an  evaluation  of  the  success 
of  the  intention.  How  often  this  cycle  recurs  is 
important,  as  it  indicates  a  highly  repetitive 
pattern  of  behavior  that  could  be  reduced  or 
eliminated  by  allowing  a  single  command  to  be 
applied  to  many  simultaneous  objects.  If  the 
detailed  values  of  each  of  these  commands  had 
been  included,  for  instance  the  name  of  each 
mission  icon,  this  cycle  would  not  have  occurred 
with  high  frequency,  since  the  cycles  would  no 


longer  be  identical  (unless  the  user  scheduled  the 
same  mission  62  times,  which  is  not  likely  but 
also  cannot  l«  determined  from  this  aiuilysis). 

On  the  other  hand,  if  the  detailed  value 
information  is  not  included,  there  could  be 
important  differences  in  these  cycles  which  are 
not  identified.  We  do  not  know,  for  example,  if 
the  command  selected  was  the  approve  or  deny 
command. 

Another  example  illustrating  the  need  for 
tiKue  analyst  control  over  the  level  of  cycle  to  be 
feund  was;  intention  to  execute  ->  menu  -> 
command  ->  field  ->  field  ->  field  ->  field -> 
button  ~>  evaluate  ->  intention  to  execute.  This 
cycle  indicates  that  a  dialog  box  was  opened 
(menu,  command),  four  data  Helds  were  accessed, 
and  the  box  was  closed  (button,  evai  jatc).  Wc  do 
not  know  speciHcally  which  dialog  box  was 
opened,  which  Helds  were  accessed,  or  even 
whether  they  are  the  same  or  different  Helds. 
Little  is  therefore  learned  from  this  cycle.  On  the 
odicr  hand,  the  generic  cycle  intention  to  execute 
->  menu  •>  comnuuid  •>  button  ->  intention  to 
execute,  if  it  occurs  repeatedly,  suggests  that 
users  are  opening  dialog  boxes  but  not  physically 
intciaciing  with  them  or  changing  any  data  in 
their  Helds.  This  could  mean  that  users  arc 
opening  dialog  boxes  for  the  sole  purpose  of 
reading  information  contained  in  them,  or  that 
they  opened  the  wrong  box,  realized  it,  and  then 
closed  it.  The  former  could  mean  that  some 
critical  task  information  needs  to  be  moved  up  to 
the  main  display  or  the  next  higher  level,  so  it  is 
more  readily  available.  The  laucr  may  mean  that 
the  names  of  the  commands  for  accessing  the 
dialog  boxes  are  confusing,  so  that  users  arc 
having  difficulty  discriminating  among  them. 

While  for  both  instances  we  can  detect  a 
general  trend,  we  do  not  know  which  dialog 
boxes  are  affected  or  how  many.  If  wc  encoded 
the  data  along  spccitlc  occurrences  only,  the 
general  pattern  would  not  show  up.  With  the 
example  given  in  the  transition  matrix  section 
for  the  simulation  usability  study,  the  encoded 
command  "save"  allowed  us  to  discover  the 
problem  of  having  to  switch  interfaces  for  the 
sole  purpose  of  saving.  If  the  users  were  also 
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switching  interfaces  to  perfonn  some  other 
command,  we  would  miss  this  activity  unless  we 
selected  that  command  to  be  encoded  as  well. 
Using  the  moie  generic  predicate  "command"  as 
an  encoding  would  have  showed  all  occurrences 
of  this  pattern,  but  further  investigation  would  be 
needed  to  detennine  which  commands  were 
involved.  The  optimal  condition  to  maximize 
this  routine's  effectiveness  would  be  to  allow  the 
analyst  to  run  the  frequency  of  cycles  at  a  variety 
of  levels. 

43  Graphical SununcuraadonTechtuques 

The  task  movement  and  task  rate  graphing 
techniques,  which  were  used  only  in  the  CAAD 
study,  were  found  to  be  a  good  way  of 
summarizing  the  users'  progress  toward  their 
goals,  in  terms  of  both  the  tinne  and  number  of 
user  inputs.  Global  usability  problems,  such  as 
a  high  ratio  of  system  commands  to  actual  task 
commands,  can  be  seen.  These  techniques, 
however,  are  too  generic  and  high-level  for 
directly  indicating  specific  usability  problems. 
Again,  unless  two  or  more  systems  are  being 
compared,  many  users'  data  is  needed  to  determine 
whether  these  problems  are  due  to  the  system 
design  or  the  users'  use  of  the  system. 

We  generated  inter-event  interval  graphs  for 
the  CAAD  and  airspace  studies.  These  graphs 
plot  the  time  lags  between  each  user  input  event 
The  presence  of  long  delays  may  pmnt  the 
aiuilyst  to  areas  of  human-computer  interaction 
where  the  users  are  experiencing  difficulty  and 
which  may  warrant  further  investigation.  This 
technique  was  useful  in  the  CAAD  study,  as  we 
could  sec  the  effect  of  task  complexity  on  the 
frequency  and  duration  of  the  long  inter-event 
times.  In  this  case,  the  long  lag  times  were  due 
to  the  designers  using  the  time  to  problem  solve 
and  think  up  design  solutions  to  meet  the 
requirements.  We  were  also  able  to  divide  the 
graph  into  discrete  task  activity  areas,  to  sec 
which  activities  were  most  affected  by  the 
increasing  task  complexity. 

The  technique  was  less  effective  for  the 
airspace  study,  because  this  study  was  pcrfoimcd 


on  a  prototyped  system  an'l  its  software 
performance  was  not  maximized.  Thus, 
redrawing  the  complex  screens  caused  a  longer 
than  normal  time  delay  and  introduced  a  lot  of 
noise  into  our  graphs.  Some  of  the  long  time 
lags  were,  however,  due  to  the  users  referencing 
written  manuals  and  provided  materials  or 
attempting  to  recover  from  usability  problems. 

4.4  Value  Usl  and  the  Collection  PretUcate 
Instances 

The  SHAPA  value  list  routine  generates  a 
report  on  the  number  of  occutrcncts  of  each 
constant  for  each  predicate.  One  of  two  SHAPA 
routines  where  values  were  used,  this  is  a  detailed 
frequency  counter  that  is  always  helpful  for 
usability  testing.  The  value  list  for  the  "task 
intention"  predicate,  for  instance,  lists  all  the 
instances  of  the  users'  task  intentions  and  their 
frequencies.  This  was  useful  in  the  airspace 
study  for  counting  error  types,  as  six  error 
classifications  were  used  as  predicates;  it  also 
piovidcd  counts  of  the  specific  instances  of  each 
error  type  within  tlie  six  classifications.  The 
value  list  is  also  helpful  for  providing 
information  on  the  most  frequently  used 
commands,  as  well  as  the  frequency  of  events 
which  arc  cortsidcred  to  be  usability  problems. 

In  the  airspace  study,  for  instance,  when  we 
encoded  the  data  we  tried  to  differentUde  the 
reason  for  certain  event's  occurrence.  Some 
events  arc  executed  routinely  in  the  normal 
course  of  interaction,  but  sometimes  the  same 
events  are  performed,  for  example,  to  improve 
perceptibility.  The  distinction  is  important, 
because  in  one  case  it  indicates  a  potential 
usability  problem  or  an  area  that  could  be 
improved,  while  in  the  other  case  it  may  not. 

For  instance,  when  the  timebar  was  moved  to 
control  the  part  of  the  schedule  that  is  viewed, 
the  length  of  time  for  which  the  bar  was 
manipulated  was  recorded,  and  the  constant  "p" 
was  added  as  a  value  if  users  were  thought  to  be 
performing  the  action  to  improve  the 
perceptibility  of  missions  on  the  display. 
Similarly,  we  had  a  predicate  named  "evaluate"  in 
the  airspace  study  with  four  states:  OK,  abort, 
incomplete  and  wrong.  The  frequencies  of  the 
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latter  throe  helped  to  point  out  when  sequences  of 
activity  were  not  progressing  well.  By  also 
including  as  a  value  the  activity  name  that  was 
being  evaluated,  we  were  able  to  correlate  the 
evaluate  state  information  to  the  activity  being 
performed  ~  c.g..  Evaluate  (2-3-socschcdule* 
abort)  [31. 

Coltoction  of  predicate  instances  gathers 
segments  that  have  been  encoded  with  the  same 
ptedicate.  In  the  airspace  study,  we  found  this 
extremely  useful  for  supporting  out  error  analysis. 
We  had  defined  six  predicates  related  to  errors. 
Using  the  collection  of  predicate  instances  for  each 
error  predicate  for  each  subject,  we  could  easily 
determine  the  number  of  cn'ors  of  each  type  that 
occuiied,  along  with  the  specific  error  descriptions 
and  the  line  number  where  the  predicate  was 
located  in  the  flic.  The  line  number  was  useful, 
as  we  often  needed  to  go  back  to  the  original 
encoded  file  to  collect  more  information  on 
activities  associated  with  the  error. 

Use  of  the  error  code  across  studies  is  also 
interesting.  In  the  first  two  studies,  the  error 
code  was  used  in  the  additional  way.  If  the 
system  responded  to  a  user  input  with  an  error 
message,  the  event  is  coded  as  a  general  error.  In 
the  air^occ  suidy,  we  used  a  more  advanced  error 
coding  scheme.  By  integrating  the  data  from  the 
verbal  protocols  on  the  users'  intentions  and  their 
actual  inputs,  we  were  able  to  not  only  assess 
errors  of  the  physical  or  execution  type,  but  also 
those  in  which  the  user’s  sequence  of  activities 
did  not  correspond  with  their  intentions  (errors  in 
action  specification),  a  type  of  error  that  does  not 
cause  the  system  to  generate  an  error  message. 

Wc  also  had  classifications  for  other  cognitive 
errors,  such  as  errors  in  intention,  and  cnors  in 
perception,  interpretation,  and  evaluation,  as  well 
as  the  more  traditional  execution  error.  This  is  a 
good  example  of  where  the  encoding  process  is 
itself  a  form  of  analysis. 

4.5  MRP  Analysis 

The  MRP  analysis  technique  was  applied 
separately  to  the  five  participants'  collected 
unencoded  input  data  files  from  the  airspace 


usability  study.  The  data  was  very  detailed,  with 
each  data  line  containing  information  about  die 
user  action  (pressed,  released,  typed,  moved),  tlic 
object  type  (button,  field,  scroll  bar,  time  bar), 
the  specific  name  of  the  object  (Bravo77,  "Add" 
button,  string  typed),  and  the  location  of  the 
object  (in  Build  folder  dialog  box,  in  Create/Edit 
dialog,  etc.).  The  five  data  flics  ranged  from 
1841  to  3317  lines  in  length,  with  238,  386, 
420, 422,  and  S34  MRPs  generated.  A  sample 
MRP  is  shown  in  figure  4. 


mip#  6 

0)  Releued  Button  on  Br«vo77  in  an  Sua  Pane 

1)  Preiied  Button  on  Bnvo77  in  an  Sua  Pane 

2)  Releaied  Button  on  Bravo77  in  an  Sua  Pane 

3)  Pieaied  Button  on  Sua  Deicriplion  Field  in  Creaie/Edil  Dialog 

4)  Pimied  Button  on  Sua  Deicriplion  Field  in  CreateAidit  Dialog 

5)  Typed  ”0900*  in  Sua  Deicriplion  Held  in  Crutc/Edit  Dialog 

6)  Pieaied  Button  on  Sua  Deicriplion  Field  in  Creatc/Edit  Dialog 

7)  Pceiied  Button  on  Sua  Deicriplion  Field  in  Cieate/Edii  Dialog 
S)  Typed  "OOOO*  in  Sua  Description  Field  in  Cieatc/Edit  Dialog 

9)  Pressed  Button  on  Sua  Deicriplion  Field  in  Cicate/Edit  Dialog 

10)  Pieiwd  Button  on  Sua  Deicriplion  Field  in  CrcatcAitUl  Dialog 

11)  Typed  ”0900*  in  Sua  Detcripiion  Field  in  Cieale/Edit  Dialog 

12)  Piciied  Button  on  Sua  Deicriplion  Field  in  Cireaie/Edii  Dialog 

13)  Iheiied  Button  on  Sua  Deicriplion  Field  in  Cteaie/Edit  Dialog 

14)  Typed  "0900”  in  Sua  Deicriplion  Field  in  Creaie/ltdii  Dialog 

15)  Pieiscd  Button  on  Create  Reipieil  Butlon  in  Creale/Edil 
Dialog 

16)  Pretied  Button  on  OK  or  Cancel  Button  in  Confimtilion  Box 

17)  Presied  Button  on  Bravo77  in  an  Sua  Pane 

18)  Releaied  Button  on  Bravo77  in  an  Sua  Pane 
at:  1661  1679  1755 

Total  number  of  poiilioni  =  3. 


Figure  4  Sample  Output  from  the  MRP  Tool 
Showing  a  Single  MRP  of  Length  19,  Occurring 
in  3  Different  Positions. 

To  assess  the  MRPs,  wc  tried  using  the 
heuristics  provided  by  Siochi  ct  al.  (1991)  to 
narrow  down  the  number  of  MRPs  that  need  to 
be  examined.  These  included  examining  the 
longest  MRPs,  the  most  frequently  occurring, 
and  anomalies  departing  from  the  expected 
patterns  of  MRPs  (expected  patterns  arc  few  long 
MRPs  and  many  short  MRPs).  Unfortunately, 
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this  limited  set  of  MRPs  did  not  reveal  any 
usability  problems,  and  we  had  to  examine  every 
generated  MRP.  In  general,  Uie  more  meaningful 
patterns  seemed  to  relate  to  five  types  of  activity: 
scroll  bar  movement,  time  bar  movement,  data 
editing  sequences  in  dialog  boxes,  list  selection, 
selecting  or  moving  the  mission  icons,  and 
approving  mission  sequences. 

For  sonte  participants'  data  sets  it  was  harder 
to  find  potential  usability  problems  in  the 
generated  MRPs.  because  the  participants  did  not 
work  methodically.  Few  meaningful  repetitious 
patterns  could  be  identified  among  the  many 
repeating  sequences  idcntilted.  Some  repetitious 
patterns  concerning  usability  issues  could  be 
seen,  however,  in  the  MRPs  relating  to  the 
ability  to  select  only  individual  items  from  a  list, 
having  to  schedule  each  mission  part  and  each 
mission  individually,  and  the  dialog  box 
problem.  The  dialog  box  problem  was 
previously  discussed  as  the  case  in  which 
SH  APA  generated  a  high-level  cycle  showing 
dialog  boxes  being  opened  and  then  immediately 
closed,  but  yielding  no  information  on  which 
dialog  box  was  used.  With  MRP  analysis,  some 
MRPs  were  generated  showing  the  actual  patterns 
of  actions  for  this  occurrence  for  the  create/edit 
dialog  box.  To  find  all  the  ^recific  occurrences, 
however,  involves  looking  across  all  the  MRPs, 
because  problems  of  tlie  same  type,  or  even 
identical  patterns,  are  not  necessarily  grouped 
together.  If  the  sequence  of  interest  was 
sometimes  part  of  a  larger  repeating  sequence, 
that  larger  sequence  would  be  located  in  a 
different  MRP.  Given  the  large  number  of 
MRPs  generated,  it  can  be  difTicult  to  find  all 
instances. 

This  technique  provides  only  oite  potential 
indicator  of  usability  problems  -  that  of 
repetitive  sequences  of  activities.  Problems  with 
the  technique  include  the  random  approach  to 
pattern  identification,  which  precludes  frequency 
counts  of  a  particular  pattern,  and  the  patterns 
identified,  which  are  totally  context  free  and 
unrelated  to  any  task  or  user  interface  sequences. 
Many  usability  problems  can  only  be  identified  if 
user  intentions  are  known,  and  this  technique 


will  not  find  those.  It  also  generates  a  large 
amount  of  output  with  a  lot  of  noise;  e.g.,  many 
MRPs  were  generated  relating  to  scroll  bar 
activity  or  tabbing  through  data  fields. 

The  technique  docs  have  some  good  points. 
Many  of  the  problem  specifics  missed  by 
SHAPA's  frequency  of  cycles  because  values 
were  not  considered  were  made  somewhat 
apparent  with  this  analytic  technique  (particularly 
since  we  knew  what  to  look  for),  since  it  was 
operating  on  much  more  detailed  data.  The 
technique  is  relatively  easy  and  quick  to  apply  if 
the  appropriate  data  can  be  collected;  no  data 
encoding  is  required.  The  program  had  no  trouble 
accommodating  large  data  files.  Finally,  the 
command  usage  statistics  could  be  useful  for 
providing  frequency  infomialion  at  a  very  detailed 
level;  the  formatting  of  this  particular  output, 
however,  could  asc  some  improvement. 

5  Conclusions 

We  hoped  u>  .shed  light  on  the  types  of 
system  usability  information  each  of  the 
sequential  data  analysis  techniques  revealed,  the 
trade-off  of  questions  answered  and  level  of 
encoding  used,  and  whether  it  was  worth 
aqrplying  the  techniques.  Overall,  we  conclude 
that  we  did  not  have  a  great  deal  of  success  in 
effectively  utilizing  most  of  the  sequential  data 
analysis  techniques  for  analyzing  our  usability 
study  data.  Many  interacting  variables  affect 
what  can  be  learned  from  application  of  the 
techniques,  including  the  types  of  data  that  can  be 
collected,  the  encoding  scheme  used  if  tlic  data  is 
etKoded,  the  flexibility  with  which  the  SDA 
routines  can  be  applied,  and  the  types  of  usability 
problems  to  be  addressed.  If  we  had  to  rank-order 
the  techniques  discussed  here  from  best  to  worst 
for  identifying  usability  problems  based  on  our 
experiences,  we  would  put  hierarchical  data 
encoding  as  the  most  useful  activity,  and  MRP 
analysis  second  (bccau-sc  it  is  quick  and  easy  to 
apply),  followed  the  value  list,  collection  of 
predicate  instances,  frequency  of  cycles,  transition 
matrices,  and  lag  sequential  analysis.  To  indicate 
overall  system  usability,  the  graphical  techniques 
are  somewhat  useful. 
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Nevcdhclcss,  wc  are  still  atlracted  to  the  idea 
of  pattern  analysis  and  analytic  techniques  for 
analyzing  usability  data  and  feel  the  problems  wo. 
encountered  arc  due  mostly  to  limitations  of  the 
currently  available  software  packages  •• 
spcciftcally,  their  lack  of  flexibility  in  specifying 
the  data  parts  and  levels  for  the  routines  to  act 
on.  To  effectively  utilize  routines  such  as 
transition  matrices,  lag  sequential  analysis,  and 
frequency  of  cycle  analysis,  the  usability  analyst 
needs  to  be  able  to  apply  the  routines  at  various 
levels,  without  having  to  recode  the  data.  As  we 
have  shown,  we  need  to  be  able  to  identify  both 
generic  ami  specific  patterns  in  the  data  with  a 
single  tool.  This  could  be  achieved  by  having 
frequency  of  cycles,  lag  sequential,  and  transition 
matrices  routines  operate  on  both  the  predicates 
and  their  values,  permitting  use  of  wild  cards  for 
particular  values.  This  would  (Hovidc  the 
flexibility  needed  to  get  at  a  large  variety  of 
useful  patterns,  or  to  follow  up  leads  iiulicaied  by 
the  general  patterns. 

For  the  frequency  of  cycles  routine,  allowing 
identification  of  both  a  start  and  an  end  predicate 
would  allow  analysts  to  better  define  the  types  of 
patterns  wc  want  the  system  to  find.  There  seem 
to  be  at  least  two  types.  One  is  a  task  activity 
pattern  in  which  wc  might  want  to  specify  task- 
related  start  and  stop  points,  such  as  between 
specific  user  intentions  to  execute  or  task 
intentions  and  their  corresponding  evaluate  slate. 
This  would  depict  activity  within  a  task-domain 
cycle.  It  would  also  be  helpful  to  be  able  to 
identify  user-interface  object  usage  patterns  across 
task  activides.  Here  we  would  like  to  ^Kcify  a 
cycle,  such  as  from  dialog  box  opening  to 
closing,  which  would  find  all  dialog  box  usage 
patterns  along  with  the  usage  of  objects 
contained  within  them. 

The  problem  with  the  MRP  routines  which 
work  on  the  uncncodcd  command  files  is  the  loss 
of  context  or  user  intention  informadon.  The 
data  cannot  be  easily  aggregated  along  task  lines, 
and  the  users  goals  and  intoridons  are  not  known. 
This  makes  idcndficadcn  of  many  types  of 
usability  problems  very  difftcult.  Siochi  el  al. 


(1991)  had  to  supplement  their  MRP  analysis  of 
the  GIPSY  .system  by  interviewing  the  users. 

When  using  verbal  protocols  in  conjunction 
with  data  logging  techniques,  diC  user's  thought 
processes  can  be  extracted  to  supplement  the 
logged  mousc/keystroke  data  during  encoding; 
this  puts  stniclurc  on  what  would  have  been 
otherwise  difficult  to  interpret  data  The  process 
of  encoding  the  data  was  found  to  be  the  most 
useful  analyde  activity,  pardculaiiy  in  die 
airspace  study,  where  wc  used  codi^s  that  allowed 
us  to  hierarchically  break  down  the  user  input 
sequence  into  goals,  tasks,  intendons  to  execute, 
actual  sequences  of  inputs  within  each  task  and 
execute  intention,  and  evaluadon  of  each  aedvity. 
We  also  learned  much  from  the  detailed  error 
codes  used.  This  coding  scheme  did  not  lend 
itself  to  use  of  SHAPA's  SDA  techniques,  but 
neither  did  the  coding  schemes  wc  tried  for  the 
other  shidies.  Moreover,  with  the  encoded  data  in 
this  easy-to-read  form,  patterns  were  easily 
detectable  by  the  usability  analyst.  In  fact,  it 
was  easier  first  to  manually  detect  patterns,  then 
figure  out  which  SDA  analysis  roudne  to  run  and 
with  what  parameters,  and  finally  run  the  SDA 
routines  to  generate  iiard  frequency  counts.  To  be 
able  to  say  a  icpcdtivc  pattern  occurred  62  times 
in  90  minutes  creates  much  more  impact  than 
just  noting  that  such  a  pattern  exists. 

During  the  airspace  suidy  we  also  identified 
and  manually  extracted  other  measures  of  interest 
tliat  wc  felt  reflected  system  usability  but  were 
not  supported  by  any  software  packages,  such  as 
the  number  of  computer  acdons  per  intcndoii  to 
execute.  Some  patterns  of  user  aedvity  could  be 
recognized  by  human  analysis  might  not  be 
tdcndficd  by  a  software  pattern  recognizer  because 
they  do  not  repeat  exactly  or  regularly.  For 
instance,  users  often  looked  up  informadon  on  a 
mission  icon  in  a  dialog  box  before  scheduling 
it.  but  did  not  always  do  so  scqucndally  or  with 
the  same  exact  set  of  acdons;  also,  the  mission 
icon  was  dilTcrcnt  in  every  case.  The  computer 
programs  do  not  idendfy  these  as  rcpcddvc 
acdvitics. 
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Software  to  support  application  of  SDA 
techniques  for  usability  testing  is  still  in  its 
infancy.  As  piograms  become  mont  flexible  and 
powerful,  and  usability  analysts  identify 
HKasurcs  and  routines  of  interest  and  use  to 
them,  these  tools  should  become  more  effective. 
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